Estimating K in Genetic Mixture Models
نویسنده
چکیده
A key quantity in the analysis of structured populations is the parameter K, which describes the number of subpopulations that make up the total population. Inference of K ideally proceeds via the model evidence, which is equivalent to the likelihood of the model. However, the evidence in favour of a particular value of K cannot usually be computed exactly, and instead programs such as STRUCTURE make use of simple heuristic estimators to approximate this quantity. We show – using simulated data sets small enough that the true evidence can be computed exactly – that these simple heuristics often fail to estimate the true evidence, and that this can lead to incorrect conclusions about K. Our proposed solution is to use thermodynamic integration (TI) to estimate the model evidence. After outlining the TI methodology we demonstrate the effectiveness of this approach using a range of simulated data sets. We find that TI can be used to obtain estimates of the model evidence that are orders of magnitude more accurate and precise than those based on simple heuristics. Furthermore, estimates of K based on these values are found to be more reliable than those based on a suite of model comparison statistics. Our solution is implemented for models both with and without admixture in the software TRUEK.
منابع مشابه
The Application of Recursive Mixed Models for Estimating Genetic and Phenotypic Relationships between Calving Difficulty and Lactation Curve Traits in Iranian Holsteins: A Comparison with Standard Mixed Models
In the present study, records on 22872 first-parity Holsteins collected from 131 herds by the Animal Breeding and Improvement Center of Iran from 1995 to 2014 were considered to estimate genetic and phenotypic relationships between calving difficulty (CD) and the lactation curve traits, including initial milk yield (Ap), ascending (Bp) and descending (Cp) slope of the lactation curves, peak mil...
متن کاملRtreemix: an R package for estimating evolutionary pathways and genetic progression scores
In genetics, many evolutionary pathways can be modeled by the ordered accumulation of permanent changes. Mixture models of mutagenetic trees have been used to describe disease progression in cancer and in HIV. In cancer, progression is modeled by the accumulation of chromosomal gains and losses in tumor cells; in HIV, the accumulation of drug resistance-associated mutations in the viral genome ...
متن کاملRtreemix: a package for estimating mutagenetic trees mixture models and genetic progression scores
The mixture of mutagenetic trees introduced in [1] is an evolutionary model that provides an interpretable probabilistic framework for modeling multiple paths of ordered accumulation of permanent genetic changes that can be used for describing many disease processes. Each path captures a possible route of disease development. These models are used to model HIV progression characterized by accum...
متن کاملNew Model for Visco-Elastic Behavior of Asphalt Mixture with Combined Effect of Stress and Temperature
The analysis of pavements and their ingredients has always been important due to a good understanding of their behavior under different conditions; that leads to more accurate relations. Due to the extent of asphalt mixture application in the world, the assessment of different behaviors of this mix is very important from various aspects of performance and safety. Given that the asphalt mixtures...
متن کاملEstimating Genetic Parameters of Body Weight Traits in Kourdi Sheep
For estimating genetic parameters for body weight traits in Kourdi sheep data were collected from 1996 to 2013 in Kourdi Breeding Station in Northern Khorasan province of Iran. Studied traits were birth weight (BW), weaning weight (WW), six-month weight (6MW), nine-month weight (9MW) and yearling weight (YW). The fixed effects in the model were lambing year, sex, type of birth and age of dam. (...
متن کاملAn Overview of the New Feature Selection Methods in Finite Mixture of Regression Models
Variable (feature) selection has attracted much attention in contemporary statistical learning and recent scientific research. This is mainly due to the rapid advancement in modern technology that allows scientists to collect data of unprecedented size and complexity. One type of statistical problem in such applications is concerned with modeling an output variable as a function of a sma...
متن کامل